Antipattern: Use a Generic Attribute Table
Let's create a generic attribute table and observe how it affects a database.
We'll cover the following
The solution that appeals to some programmers when they need to support variable attributes is to create a second table, storing attributes as rows. See the diagram below showing the two tables.
Each row in this attribute table has three columns:
-
The Entity: Typically this is a foreign key to a parent table that has one row per entity.
-
The Attribute: This is simply the name of a column in a conventional table, but we have to identify the attribute on each given row.
-
The Value: Each entity has a value for each of its attributes.
For example, a given bug is an entity we identify by its primary key value 1234. It has an attribute called status. The value of that attribute for bug 1234 is NEW.
This design is called Entity-Attribute-Value (EVA). It’s also sometimes called open schema, schemaless, or name-value pairs.
Let’s create
the Issues
table and the IssueAttributes
table by using the following code.
After creating the tables, let’s insert the data as given.
Test it yourself in the following playground.
By adding one additional table, we seem to gain the following benefits:
-
Both tables have a few columns.
-
The number of columns doesn’t need to grow to support new attributes.
-
We avoid a clutter of columns that contain null in rows where the attribute is inapplicable.
This appears to be an improved design. However, the simple database structure doesn’t make up for the difficulty of using it.
Querying an attribute#
Imagine that your boss needs to run a report of the bugs reported per day. In a conventional table design, the Issues
table would have a simple attribute column such as date_reported
. The statement for creating the conventional Issues
table is given below:
You can insert the data in the Issues
table, as follows:
To query all bugs with their report dates, your boss could use a simple query like this:
To get the same information as the previous query using the EAV design, your boss needs to fetch rows from the IssueAttributes
table that stores an attribute named by the string date_reported
. This query is more verbose but less clear.
Reconstructing a row#
It’s natural to retrieve a row from the Issues
table with all its attributes in columns. We want to fetch an issue in a single row as though it were stored in a conventional table. We would like to view the records like in the following table.
issue_id | date_reported | status | priority | description |
---|---|---|---|---|
1234 | 2009-06-01 | NEW | HIGH | Saving does not work |
Because each attribute is stored on a separate row of the IssueAttributes
table, retrieving them all as part of a single row requires a JOIN
for each attribute. We must know all attributes at the time we write this query. The following query reconstructs the row shown earlier:
We must use OUTER JOIN
operations because INNER JOIN
operations would cause the query to return no rows if any of the attributes were not present in the IssueAttributes
table. As the number of attributes increases, so does the number of JOIN
operations, and the cost of this query increases exponentially.
However, if we need a query to support such a high number of JOIN
operations that we’re concerned about exceeding the database’s limits, we may have a problem with our database design. It’s common for an EAV design to lead to this problem.